Evaluation Strategies for Top-k Queries over Memory-Resident Inverted Indexes
نویسندگان
چکیده
Top-k retrieval over main-memory inverted indexes is at the core of many modern applications: from large scale web search and advertising platforms, to text extenders and content management systems. In these systems, queries are evaluated using two major families of algorithms: documentat-a-time (DAAT) and term-at-a-time (TAAT). DAAT and TAAT algorithms have been studied extensively in the research literature, but mostly in disk-based settings. In this paper, we present an analysis and comparison of several DAAT and TAAT algorithms used in Yahoo!’s production platform for online advertising. The low-latency requirements of online advertising systems mandate memory-resident indexes. We compare the performance of several query evaluation algorithms using two real-world ad selection datasets and query workloads. We show how some adaptations of the original algorithms for main memory setting have yielded significant performance improvement, reducing running time and cost of serving by 60% in some cases. In these results both the original and the adapted algorithms have been evaluated over memory-resident indexes, so the improvements are algorithmic and not due to the fact that the experiments used main memory indexes.
منابع مشابه
On the Integration of Structure Indexes and Inverted Lists
Several methods have been proposed to evaluate queries over a native XML DBMS, where the queries specify both path and keyword constraints. These broadly consist of graph traversal approaches, optimized with auxiliary structures known as structure indexes; and approaches based on information-retrieval style inverted lists. However, no published literature addresses methods of combining structur...
متن کاملBottom Up and Top Down - Twig Pattern Matching on Indexed Trees
This article describes how to implement efficient memory resident path indexes for semi-structured data. Two techniques are introduced, and they are shown to be significantly faster than previous methods when facing path queries using the descendant axis and wild-cards. The first is conceptually simple and combines inverted lists, selectivity estimation, hit expansion and brute force search. Th...
متن کاملFaster Path Indexes for Search in XML Data
This article describes how to implement efficient memory resident path indexes for semi-structured data. Two techniques are introduced, and they are shown to be significantly faster than previous methods when facing path queries using the descendant axis and wild-cards. The first is conceptually simple and combines inverted lists, selectivity estimation, hit expansion and brute force search. Th...
متن کاملImproved Single-Term Top-k Document Retrieval
On natural language text collections, finding the k documents most relevant to a query is generally solved with inverted indexes. On general string collections, however, more sophisticated data structures are necessary. Navarro and Nekrich [SODA 2012] showed that a linear-space index can solve such top-k queries in optimal time O(m + k), where m is the query length. Konow and Navarro [DCC 2013]...
متن کاملFast In-Memory XPath Search over Compressed Text and Tree Indexes
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can efficiently be implemented using a compressed self-index of the document’s text nodes. Most queries, however, contain some parts of querying the text of the document, plus some parts of querying the tree structu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 4 شماره
صفحات -
تاریخ انتشار 2011